System and method for the creation and management of user-annotations associated with paper-based processes

ABSTRACT

A computer-implemented method for identifying constraints to reducing consumable usage includes acquiring print job information for a set of print jobs submitted for printing by a set of users. A print job representation is computed for each of the print jobs from the print job information. Provision is made for user-annotation of a note associated with the submitted print jobs including a reason for printing the print job. User-annotations are received for at least some of the submitted print jobs. The print jobs may be clustered into clusters based on the print job representations and annotations. A representation of the set of print jobs includes a user dashboard and a group dashboard is generated which is based on the users&#39; annotations and other users&#39; included in a group to provide a structure for collaboration among the group members to identify the constraints to reducing consumable usage and/or provide solutions to reduce consumable usage.

CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS

U.S. Pat. No. 8,879,103, filed Nov. 4, 2014, by Willamowski et al.,entitled “SYSTEM AND METHOD FOR HIGHLIGHTING BARRIERS TO REDUCING PAPERUSAGE” and U.S. patent application Ser. No. 14/228,489, filed Mar. 28,2014, by Malik et al., entitled “VARIABLE COLOR WIDGET AND MESSAGEPRESENTATION USER INTERFACE TO ENCOURAGE USERS TO CONSUME LESSPRINTING”, are incorporated herein by reference in their entirety.

BACKGROUND

The exemplary embodiment relates to creating and managing informationassociated with documents being processed and finds particularapplication in connection with a system and method for providing astructure to create and compile user-annotated notes associated withprint jobs, the notes including a textual description of the reason(s)for printing the print jobs and/or provides solutions to reduce theconsumption of consumables associated with printing the print jobs. Thisenables modifications to business processes to reduce paper usage.

In many contexts, such as the service industry, work is generallyorganized into processes that often entail printing documents. There isa growing trend towards replacing printing paper documents with digitalcounterparts, which may entail use of electronic signatures, email(instead of post mail) and online form filling. There are many reasonsfor this change, including higher productivity, cost-efficiency, andbecoming more environmentally-friendly. Many large organizations aretherefore looking for solutions to reduce paper usage and to move fromusing paper to digital documents. Unfortunately, especially in largeorganizations, it is often difficult to achieve this goal, because of alack of information. Those in management, for example, often do not havea detailed understanding of where paper is being used by companyemployees, in particular, in which tasks or subtasks paper documents aregenerated, as well as how much paper is used in the process, in terms ofthe volume of paper being used in each of these tasks. Nor is there agood understanding of the reasons why paper is used for these tasks,i.e., what are the barriers that prevent using digital versions insteadof paper documents within these tasks.

Having answers to these questions would help organizations to selectwhich processes/tasks could be modified to facilitate moving them frompaper to digital. However, without a good understanding of the paperconsumption of the various tasks, and the reasons for printingdocuments, it is difficult to focus these efforts on the processes wherechanges would be the most effective.

The reasons for printing documents are often task dependent. Some commonreasons involve requiring signatures, archiving, transitions betweendifferent computer systems, crossing organizational barriers, and soforth. However, there may be other reasons that have not been identifiedby the organization. To move from paper to digital, appropriatesolutions may need to be implemented to replace the functions previouslyprovided through generating paper documents, such as digital archiving,digital signatures, and the like. However, for some tasks, paper mayafford benefits that digital documents do not provide. Paper is, forexample, easy portable (e.g., when traveling), easy to read andannotate, and easy to hand over to another person. Employees could beprovided with portable devices, such as eReaders, to address some ofthese issues, but this solution may not be cost-effective.

Currently, the transition from paper to digital is mainly achieved basedon either ethnographic studies or consultancy: in these approachestypically an expert is sent to the site of the organization in order tostudy the existing work processes and to analyze these processes and therelated tasks and constraints. In one study, management assumed that thepaper consumption in the office was excessive and not really requiredfor the work carried out by the employees. The extensive study carriedout by ethnographers on site tended to disprove the assumption, but wastime consuming to implement. See, Jacki O'Neill, David Martin, TommasoColombino, Antonietta Grasso, “A Little Knowledge is a DangerousThing?”, CHI 2011—Conference on Human Factors in ComputingSystems—Vancouver BC, Canada, May 2011.

There remains a need for a system and method for associating differenttasks within an organization with corresponding paper usages and usagerationales, so that candidate solutions can be evaluated and implementedefficiently.

INCORPORATION BY REFERENCE

O′NEILL et al, “WHEN A LITTLE KNOWLEDGE ISN′T A DANGEROUS THING” CHI2011—Conference on Human Factors in Computing Systems—Vancouver BC,Canada, 7-12 May 2011, is incorporated herein by reference in itsentirety.

BRIEF DESCRIPTION

In one embodiment of this disclosure, described is a method foridentifying constraints on reducing consumable usage comprising:acquiring print job information for a set of print jobs submitted forprinting by a set of users, each print job including a document to beprinted; generating a print job contextual representation for each ofthe print jobs; providing for user-annotation of the submitted printjobs with a user-annotated note expressing a reason for printing theprint job; receiving user-annotations for at least some of the submittedprint jobs; and generating a user dashboard for each of the set ofusers, the user dashboard displaying the user's print consumptionhistory, the set of users print consumption history and one or morereceived user-annotated notes for the set of users, wherein thegenerating a print job contextual representation, providing foruser-annotations of a note, receiving user-annotated notes, andgenerating a user dashboard is performed with a computer processor.

In another embodiment of this disclosure, described is a system foridentifying constraints on reducing consumable usage comprising: a jobtracking component for acquiring print job information for a set ofprint jobs submitted for printing by a set of users, each print jobincluding a document to be printed; a print job contextualrepresentation generation component for generating a print jobcontextual representation for each of the print jobs; an annotationcomponent for receiving user-annotations for at least some of thesubmitted print jobs, the user user-annotations including auser-annotated note expressing a reason for printing the print job; ananalysis component for generating a representation of the set of printjobs which represents reasons for printing of print jobs based on theusers' annotations; a user dashboard component for generating a userdashboard for each of the set of users, the user dashboard displayingthe user's print consumption history, the set of user's printconsumption history and one or more user annotated notes for the set ofusers; and a processor which implements the job tracking component,print job contextual representation annotation component, analysiscomponent, and user dashboard component.

In still another embodiment of this disclosure, described is a methodfor identifying constraints on reducing consumable usage comprising:acquiring print job information for a set of print jobs submitted forprinting by a set of users, each print job comprising a document to beprinted; computing a print job representation for each of the print jobsbased on features extracted from the print job information, the featuresincluding a statistical representation of low-level features extractedfrom patches of a page of the document; receiving user-annotations forat least some of the submitted print jobs whereby submitted print jobsare annotated with a user annotated note expressing a reason forprinting the print job; partitioning the print jobs into clusters basedon the print job representations and annotations; and generating arepresentation of the set of print jobs which represents reasons forprinting of print jobs in at least one of the clusters, based on theusers' c annotations, wherein the computing of the print jobrepresentation, receiving user-annotations, partitioning the print jobs,and generating of the representation of the set of print jobs areperformed with a computer processor.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a graphical overview of a system and method for analyzingtask-related printing;

FIG. 2 is functional block diagram of a system for analyzingtask-related printing in accordance with one aspect of the exemplaryembodiment;

FIG. 3 is flow chart of a system for analyzing task-related printing inaccordance with another aspect of the exemplary embodiment;

FIG. 4 is an example of a prompt to a user to create a note on the flyaccording to an exemplary embodiment of this disclosure.

FIG. 5 is an example of a user note created on the fly according to anexemplary embodiment of this disclosure.

FIG. 6 is an example of a system generated visual indicator to a userhighlighting a particular consumption pattern with an associatedtimeline according to an exemplary embodiment of this disclosure.

FIG. 7 is an example of a user dashboard according to an exemplaryembodiment of this disclosure.

FIG. 8 illustrates a structure of a user dashboard according to anexemplary embodiment of this disclosure.

FIG. 9 is an example of a user dashboard including a note pan (bottomleft) and a textual pattern description (bottom right) according to anexemplary embodiment of this disclosure.

FIG. 10 is an example of an added user note according to an exemplaryembodiment of this disclosure.

FIG. 11 is an example of a group dashboard according to an exemplaryembodiment of this disclosure.

FIG. 12 is an example of a filtering pan according to an exemplaryembodiment of this disclosure.

DETAILED DESCRIPTION

As discussed in the background section, companies want to reduce paperusage but lack a detailed understanding of the various paper-basedprocesses and printing requirements in place within their companies.Company employees who are actually carrying out the work are confronteddaily with these paper-based processes, are in the best position toidentify existing pain points with respect to paper usage and printingand to elaborate realistic process improvement options to reduce paperusage. Companies thus need a way to enable, engage and motivate theiremployees to contribute, and to provide their input into such adedicated “collaborative process improvement suggestion box”.

A system and method are disclosed that analyze document printing in away which enables informed decisions to be made by decision-makers, suchas managers and other organization individuals, to support an effectivemove from printed paper to digital documents.

Specifically, a note-taking management system and method is disclosedbased on the creation, the structuration and the management of employeeannotations addressing processes limits and/or suggesting improvementsregarding printing and paper usage.

More specifically, an exemplary system and method provides the followingfunctionality:

1. Identifying employees' typical, atypical, and/or particularly costlyprinting consumption patterns from their long and short term printhistory

2. Pointing out those patterns to users, prompting and motivating themto take semi-structured notes on their printing behaviors and paperusage issues though two distinct procedures:

a. On the fly: right after issuing a print job, the systems promptsusers for a note.

b. A posteriori: the system confronts the user with a high-level textualdescription and graphic visualization of the detected pattern andprompts users to respond with an explaining note.

In both cases the system associates the note taken by the user with itscontext, e.g., the corresponding print job with its meta-data or theparticular consumption pattern, for instance typically heavy colorprinting toward the end of the month period.

3. Sharing contextualized notes with peers. Peers can thus review andsupport each other's notes. The system therefore displays each note withits corresponding context, i.e., type of job, printing pattern, etc., ina personal dashboard.

4. Facilitating peer group discussions by assembling and aggregating allthe notes taken into a group dashboard. The group dashboard facilitatesgrouping notes according to different attributes, i.e., type of user,type of print job, print settings used, etc., in order to elaborateacceptable improvement suggestions.

Overall, the provided system and method covers the entire process fromdetecting employee's printing patterns to gathering employee notes inresponse, elaborated in group discussion(s) to finally identifyingviable process improvement suggestions.

Experiments have demonstrated that employees are motivated to givefeedback on their printing pain points and to elaborate and adoptimprovements to reduce paper usage wherever possible. This behavior wasobserved typically whenever individuals owned a corresponding workflowand when this workflow was of limited complexity. Such an individualapproach is not possible for more complex workflows that are often putin place by an employer and involve several employees. In such cases, itis more difficult if not impossible for the individual alone to fullygrasp and assess the root causes and constraints for paper usage, and,much less, to propose an appropriate solution or process improvement.

Therefore, prompting employees for information, followed by a discussionwith colleagues can be a key enabler to clarify issues and to elaborateappropriate solutions. Indeed, different employees see an issue underdifferent perspectives and a joint discussion has the benefit ofcrystallizing its description and also to more precisely specify itscharacteristics (e.g., in terms of the workflow concerned, itsfrequency, the volume generated, etc.). Group discussions also fosterthe individuals' contributions with respect to what constitutes anappropriate, realistic and acceptable improvement of the concernedpaper-based workflow. Valuing the individuals' contributions will inturn encourage them to actively participate in the suggestion processcontinuously over time.

Provided herein is a system and method supporting the entire process:from individual's annotation of pain points to collective discussion andsolutions. Concerning the note taking, the system disclosed letsindividual employees take notes either on the fly (when carrying out thework, and printing) or a posteriori, based on a textual description andgraphic visualization of print history information. Individuals thenshare their notes within their peers. Peer groups finally use the systemto frame and characterize issues raised by user notes and thenappropriate solutions based on the individual's notes.

According to another aspect of this disclosure, an exemplary methodidentifies recurring paper-based tasks by storing and analyzing printlogs, estimates the impact of each task in terms of consumable usage,such as in terms of paper volume and/or power consumption, andidentifies constraints that explain the reasons for printing, allowingidentification of the barriers that prevent moving these tasks frompaper to digital form.

A digital document includes one or more digital pages in electronicform. Document printing refers to the rendering of a digital document inhardcopy, e.g., paper form. Document printing may be quantified in termsof usage of one or more consumables employed in the output of printeddocuments.

Example consumables consumed in printing include print media, e.g.,paper; marking materials, such as inks or toners; energy consumed by anoutput device, such as a printer, or the like in the rendering of adigital document in hardcopy, or a combination thereof. In the exemplaryembodiment, usage of the consumable is quantified, at least in part, interms of an amount of paper used in printing a document, such as anumber of sheets of paper or a volume or weight of paper used (allowingdifferent sizes and/or densities of the sheets to be accounted for),although other quantifiable measures may be employed, such as units ofelectric power consumed in printing (which may take into account thatdifferent printers/printing modes consume different amounts of energy),weight or volume of marking material (e.g., based on degree of coverageof each page, or an estimated average consumed per page), or bycomputing an overall cost per digital page printed, which may factor intype of print medium (e.g., bond vs. regular paper), whether monochromeor color printing is used, which of a plurality of printers is used inprinting (where each printer has a different energy consumption),whether duplex or simplex printing is employed, or a combination ofthese and/or other factors directly or indirectly related to thequantity of one or more consumables used.

Each digital document can be in any convenient file format for printing,such as Word, PowerPoint, Spreadsheet, JPEG, Graphics Interchange Format(GIF), JBIG, Windows Bitmap Format (BMP), Tagged Image File Format(TIFF), JPEG File Interchange Format (JFIF), Delrin Winfax, PCX,Portable Network Graphics (PNG), DCX, G3, G4, G3 2D, Computer AidedAcquisition and Logistics Support Raster Format (GALS), Electronic ArtsInterchange File Format (IFF), IOCA, PCD, IGF, ICO, Mixed ObjectDocument Content Architecture (MO:DCA), Windows Metafile Format (WMF),ATT, BRK, CLP, LV, GX2, IMG(GEM), IMG(Xerox), IMT, KFX, FLE, MAC, MSP,NCR, Portable Bitmap (PBM), Portable Greymap (PGM), SUN, PNM, PortablePixmap (PPM), Adobe Photoshop (PSD), Sun Rasterfile (RAS), SGI, X BitMap(XBM), X PixMap (XPM), X Window Dump (XWD), AFX, Imara, Exif,WordPerfect Graphics Metafile (WPG), Macintosh Picture (PICT),Encapsulated PostScript (EPS), combination thereof, or other common fileformat used for documents. In general, each page of the document mayinclude one or more of text, raster graphics, and images, each imageincluding image data, e.g., as an array of pixels.

A print job generally includes a printing object, which includes thedigital document to be printed in a format recognized by the selectedprinter, e.g., Postscript, together with a job ticket, which providesinformation about the print job that will be used to control how the jobis processed, such as number of copies, double or single sided printing,color or monochrome printing, paper type, specific printer selected, andso forth.

Print logs are records of print jobs that have been printed on a printer(or which are being sent for printing on a printer) and which containinformation about the print jobs, such as the document name, user ID,image data for each page of the document, job ticket information, and soforth.

A task, as used herein, can be any function which entails generating arecord of a document in at least one of digital and paper form and whichis performed repeatedly by at least one and generally by multiple peoplewithin an organization. More particularly, in the exemplary embodiment,a task can be defined in terms of a cluster of similar documents, one ormore of which may have been manually assigned a task label selected froma set of task labels. Document similarity can be determined based on oneor more document features, as described in further detail below. Themanually assigned task labels may be assigned by a user, such as aperson who has selected a particular document for printing, or, in somecases, the user may be another person, who reviews the documents thathave been printed or associated stored information, e.g., derived fromthe print logs.

The information generated in the exemplary method can assist managementin taking appropriate and informed decisions, select the tasks thatmatter most in terms of consumable usage, and remove the correspondingbarriers that prevented the move from paper to digital, for example, byinvesting in and modifying the organization's infrastructure andmodifying business processes accordingly.

With reference to FIG. 1, an overview of an exemplary print analysissystem 10 and method is shown. The exemplary system 10 tracks users'print jobs and computes features for some or all of them. It thencombines print job representations and user annotations of print jobs toprovide information about paper based tasks, the consumables theyrepresent, and the reasons why they are required in paper format.

The system 10 includes a print job tracking component 12 that interceptsprint jobs 14 that are sent by different users 16 within theorganization to a printing infrastructure 18 (and/or which receivesinformation on the print jobs from the printing infrastructure, such asprint logs). The number of users and print jobs is not limited but mayinclude at least 2 or at least 5, or at least 10 and up to 100 or moreusers, each generating one or more print jobs for printing on theprinting infrastructure 18, for example, over a selected time period,such as a day, week, month, or the like. In the exemplary embodiment,the number of print jobs may be at least 10, or at least 100, or up to1000 or more.

A features extractor 20 extracts and computes for each individual printjob, a print job representation (or “signature”) comprising a set offeature descriptors (such as a user ID, a printing date and time, adocument title, a document length, visual and/or textual documentcontent-based features, or a combination thereof). The print jobsignature can be a vectorial representation of information extractedfrom the print job.

An annotation component 22 provides for users 16 to annotate at leastsome of the print jobs 14 with: user-annotated note which relates to theconstraints (reasons or barriers) as to why the corresponding documentwas considered beneficial to be printed or was required to be in paperrather than simply digital format.

An optional clustering component 24 identifies clusters 26 of similarprint jobs 14. The clustering is based on the assumption that similarprint jobs will belong to similar tasks and that users have work rolescorresponding to a specific subset of tasks and thus print essentiallythe corresponding types of print jobs. Thus, print jobs which have noannotations can be clustered based on the similarity of their print jobsignatures to those of annotated jobs.

As illustrated in FIG. 2, the system 10 may suitably be hosted by one ormore computing devices 30. For example, the system 10 includes mainmemory 32 which stores instructions 34 for performing the exemplarymethod, including the print job tracking component 12, featuresextractor 20, annotation component 22, and clustering component 24,described above with reference to FIG. 1.

An analysis component 36 generates task-related information 38, based onthe clustering and annotations, which is output from the system 10. Inthe exemplary embodiment, the components 12, 20, 22, 26, 36 are in theform of software which is implemented by a computer processor 40 incommunication with memory 32.

In the illustrated embodiment, the computing device 30 receives printjob information comprising print jobs 14, and/or information extractedtherefrom, such as print logs 41, via a network. In one embodiment theprint jobs 14 are received by the job tracking component 12 from aplurality of client computing devices 44, 46, 48 linked to the network,that are used by the respective users 16 to generate print jobs.However, it is to be appreciated that print job information for thesubmitted print jobs 14 may alternatively or additionally be receivedfrom the printing infrastructure 18 or from a print job server (notshown), which distributes the print jobs 14 to the various printers inprinting infrastructure 18. The print job information 14, 41 is receivedby the system 10 via one or more input/output (I/O) interfaces 50, 52and stored in data memory 54 of the system 10 during processing. Thecomputing device 30 also may control the distribution of the receivedprint jobs 14 to respective printers 56, 58 of the printinginfrastructure 18, or this function may be performed by another computeron the network.

The feature extractor 20 extracts features from the print jobinformation. The extracted features are used to generate arepresentation 60 of each print job, which may be stored in memory 54.

The annotation component 22 receives, as input, print job annotations 62for at least some of the print jobs 14, via the network, e.g., from theclient computing devices 44, 46, 48 and stores the annotations, orinformation extracted from them, in memory 22. The annotations mayinclude task-related information and/or information on constraintsprovided in the form of a note which limit or prevent the user's abilityto use a digital version of the printed document rather than printing apaper copy. Alternatively, the task-related information may include atask category selected from a plurality of task categories, orinformation from which the task category may be inferred. Theconstraint-related information may include a constraint categoryselected from a plurality of constraint categories, or information fromwhich the constraint category may be inferred.

The clustering component 24 may be trained, on the annotated (labeled)print jobs and is then able to cluster a set of labeled and unlabeledprint jobs into a plurality of clusters 26. Hardware components 32, 40,50, 52, 54 may communicate via a data/control bus 64. The processor 40executes the instructions for performing the method outlined in FIG. 3.

The client devices 44, 46, 48 may each communicate with one or more of adisplay 66, for displaying information to users, and a user input device68, such as a keyboard or touch or writable screen, a cursor controldevice, such as mouse or trackball, a speech to text converter, or thelike, for inputting text and for communicating user input informationand command selections to the respective computer processor and toprocessor 40 via network.

The computer device 30 may be a PC, such as a server computer, adesktop, laptop, tablet, or palmtop computer, a portable digitalassistant (PDA), a cellular telephone, a pager, combination thereof, orother computing device capable of executing instructions for performingthe exemplary method.

The memory 32, 54 may represent any type of non-transitory computerreadable medium such as random access memory (RAM), read only memory(ROM), magnetic disk or tape, optical disk, flash memory, or holographicmemory. In one embodiment, the memory 32, 54 comprises a combination ofrandom access memory and read only memory. In some embodiments, theprocessor 40 and memory 32 may be combined in a single chip. The networkinterface 50, 52 allows the computer 30 to communicate with otherdevices via a computer network 42, such as a local area network (LAN) orwide area network (WAN), or the internet, and may comprise amodulator/demodulator (MODEM) a router, a cable, and/or Ethernet port.Memory 32, 54 stores instructions for performing the exemplary method aswell as the processed data 38.

The digital processor 40 can be variously embodied, such as by asingle-core processor, a dual-core processor (or more generally by amultiple-core processor), a digital processor and cooperating mathcoprocessor, a digital controller, or the like. The exemplary digitalprocessor 40, in addition to controlling the operation of the computer30, executes instructions stored in memory 34 for performing the methodoutlined in FIG. 3.

The client devices 44, 46, 48 may be configured as for computing device30, except as noted.

The term “software,” as used herein, is intended to encompass anycollection or set of instructions executable by a computer or otherdigital system so as to configure the computer or other digital systemto perform the task that is the intent of the software. The term“software” as used herein is intended to encompass such instructionsstored in storage medium such as RAM, a hard disk, optical disk, or soforth, and is also intended to encompass so-called “firmware” that issoftware stored on a ROM or so forth. Such software may be organized invarious ways, and may include software components organized aslibraries, Internet-based programs stored on a remote server or soforth, source code, interpretive code, object code, directly executablecode, and so forth. It is contemplated that the software may invokesystem-level code or calls to other software residing on a server orother location to perform certain functions.

As will be appreciated, FIG. 2 is a high level functional block diagramof only a portion of the components which are incorporated into acomputer system 10. Since the configuration and operation ofprogrammable computers are well known, they will not be describedfurther.

With reference to FIG. 3, a method for analysis of the reasons forprinting print jobs is shown, which can be performed with the system ofFIG. 2. The method begins at S100.

At S102, print job information 14, 41 is acquired for a collection ofprint jobs generated by a set of users 16, such as company employees,and stored in computer memory 54.

At S104 provision is made for the employees 16 to annotate their printjobs, e.g., via a graphical user interface generated on the user'sclient device 44, 46, or 48. The annotations also include auser-annotated note which explains why the printing of the job wasneeded or beneficial.

At S106, user annotations 60 are received by the system 10 and stored inmemory.

At S108, for each of a set of the print jobs, a print job representation6260 is generated, which includes features extracted from the print jobinformation received at S102.

At S114, consumable usage is computed, by the analysis component 36, forthe print jobs in the clusters. In one embodiment, the number of pagesin each print job in the set of submitted jobs is stored in memory andthe total number of print job pages is computed for the print jobs.

At S116, information is generated based on the user annotated notes,such as a representation of the constraints and/or features for theprint jobs.

At S118, information 38 is output.

The method ends at S120.

The method illustrated in FIG. 3 may be implemented in a computerprogram product that may be executed on a computer. The computer programproduct may comprise a non-transitory computer-readable recording mediumon which a control program is recorded (stored), such as a disk, harddrive, or the like. Common forms of non-transitory computer-readablemedia include, for example, floppy disks, flexible disks, hard disks,magnetic tape, or any other magnetic storage medium, CD-ROM, DVD, or anyother optical medium, a RAM, a PROM, an EPROM, a FLASH-EPROM, or othermemory chip or cartridge, or any other non-transitory medium from whicha computer can read and use.

Alternatively or additionally, the method may be implemented intransitory media, such as a transmittable carrier wave in which thecontrol program is embodied as a data signal using transmission media,such as acoustic or light waves, such as those generated during radiowave and infrared data communications, and the like.

The exemplary method may be implemented on one or more general purposecomputers, special purpose computer(s), a programmed microprocessor ormicrocontroller and peripheral integrated circuit elements, an ASIC orother integrated circuit, a digital signal processor, a hardwiredelectronic or logic circuit such as a discrete element circuit, aprogrammable logic device such as a PLD, PLA, FPGA, Graphical card CPU(GPU), or PAL, or the like. In general, any device, capable ofimplementing a finite state machine that is in turn capable ofimplementing the flowchart shown in FIG. 3, can be used to implement themethod. As will be appreciated, while the steps of the method may all becomputer implemented, in some embodiments one or more of the steps maybe at least partially performed manually.

Further details of the system and method will now be described.

Print Job Tracking (S102)

Print job tracking systems that provide the basic functionality of theexemplary print job tracking component 12, such as intercepting printjobs issued through a print infrastructure 18 and extracting thecorresponding user name, document title, document length, and similarinformation are readily available. For example, a device managementprogram, such as the Xerox Device Manager (XDM), accessible throughXerox CentreWareWeb™, can be installed in a printer network. Such amonitoring system 12 is able to mine information regarding print jobs aswell as to intercept data to be printed and to store it in the form of aPDF/PS file.

The functionality of such a system can be enhanced with instructions tocompute additional features, such as visual or textual document contentand/or layout from the print jobs that will also allow taking intoaccount the layout and content of the printed documents. This can beperformed by rendering the page description language (PDL) document andthen applying OCR or visual feature extraction to one or more of therendered document pages. The features extracted can then be used incomputing a word-based, or image feature based representation of thedocument page. Other features can be based on color or black & whitepixels coverage which helps to estimate the type of document: drawing,text or graphics. These representations, or features extractedtherefrom, can serve as features of the print job representation.

Feature Extraction (S108)

Exemplary features used in clustering the print jobs can be selectedfrom:

1. User ID, such as the user name of the user submitting the print jobfor printing.

2. Print job submission time.

3. Document title, which may be extracted from the filepath of thedocument.

4. Document length, which may be expressed in terms of a number ofsheets in the job being printed.

3. Print job type selected from a predefined set of job types (e.g.,selected from Email; spreadsheet, such as Excel; graphics; PDF;PowerPoint; RTF; Text; drawing program, such as Visio or Chemdraw; Webpage; Word, other).

4. Textual content features, such as word frequencies of each of aselected set of words, extracted from the title and/or content of theprinted document.

5. Visual content features, such as features based on color and/orgradient of pixels of patches of a document page image.

6. Coverage features, such as the number/proportion of pixels which are“on” (having a color) or the number/proportion of pixels of each of thecolor separations (e.g., C, M, Y, and optionally K).

In some embodiments, at least two, or at least three, or at least four,or all of these feature types is extracted for each print job. Some ofthe features may be generated by the job tracking component, asdiscussed above. Other features may be extracted by the featureextractor. The feature values acquired by the job tracking component,and/or feature extractor, may each be normalized to a common range ofvalues, such as 0-1. The print job representation as a whole may also benormalized so the values sum to 1, or some other normalization isperformed. Some of the features may be weighted, in the print jobrepresentation, to reflect their relative importance, although theclustering component may also learn which features are most importantfor clustering the jobs and weight them accordingly.

For generating print job type features, each print job type may be aseparate feature in the representation and a value of 1 can be accordedif the job is of that type, 0 otherwise.

For extracting a document title from a filepath, the document title maybe taken from the last forward slash to the final period. In otherembodiments, the title may be stored as metadata, or in otherinformation associated with the document. The identified document titlemay be split into words and a histogram representation generated of thewords that it contains. The histogram may represent a limited set ofwords, such as those expected to be found in document titles, and mayexclude stop words which are too frequent to be discriminative.

Methods for extracting features from text are described, for example, inU.S. Pub. No. 20100070521, published Mar. 18, 2010, entitled QUERYTRANSLATION THROUGH DICTIONARY ADAPTATION, and U.S. Pub. No.20100082615, published Apr. 1, 2010, entitled CROSS-MEDIA SIMILARITYMEASURES THROUGH TRANS-MEDIA PSEUDO-RELEVANCE FEEDBACK AND DOCUMENTRERANKING, both by Stephane Clinchant, et al., the disclosures of whichare incorporated herein by reference in their entireties. Therepresentation can be a bag-of-words representation which is based onthe number of occurrences of each of a set of words in the document pageor set of pages.

For generation of an image representation of one or more pages of thedocument to be printed, the feature extractor may generate any suitablehigh level statistical representation of an image constituted by thedocument page or part thereof, such as a multidimensional vectorgenerated based on features extracted from the image. Fisher Kernelrepresentations and Bag-of-Visual-Word representations are exemplary ofsuitable high-level statistical representations which can be usedherein. The exemplary representations are of a fixed dimensionality,i.e., each image signature has the same number of elements.

For example, the feature extractor 20 includes a patch extractor, whichextracts and analyzes low level visual features of patches of the image,such as shape, texture, or color features, or the like. The patches canbe obtained by image segmentation, by applying specific interest pointdetectors, by considering a regular grid, or simply by the randomsampling of image patches. In the exemplary embodiment, the patches areextracted on a regular grid, optionally at multiple scales, over theentire image, or at least a part or a majority of the image.

The extracted low level features (in the form of a local descriptor,such as a vector or histogram) from each patch can be concatenated andoptionally reduced in dimensionality, to form a features vector whichserves as the global image signature. In other approaches, the localdescriptors of the patches of an image are assigned to clusters. Forexample, a visual vocabulary is previously obtained by clustering localdescriptors extracted from training images, using for instance K-meansclustering analysis. Each patch vector is then assigned to a nearestcluster and a histogram of the assignments can be generated. In otherapproaches, a probabilistic framework is employed. For example, it isassumed that there exists an underlying generative model, such as aGaussian Mixture Model (GMM), from which all the local descriptors areemitted. Each patch can thus be characterized by a vector of weights,one weight for each of the Gaussian functions forming the mixture model.In this case, the visual vocabulary can be estimated using theExpectation-Maximization (EM) algorithm. In either case, each visualword in the vocabulary corresponds to a grouping of typical low-levelfeatures. The visual words may each correspond (approximately) to amid-level image feature such as a type of visual (rather than digital)object (e.g., ball or sphere, rod or shaft, flower, autumn leaves,etc.), characteristic background (e.g., starlit sky, blue sky, grassfield, snow, beach, etc.), or the like. Given an image to be assigned arepresentation, each extracted local descriptor is assigned to itsclosest visual word in the previously trained vocabulary or to allvisual words in a probabilistic manner in the case of a stochasticmodel. A histogram is computed by accumulating the occurrences of eachvisual word. The histogram can serve as the image representation orinput to a generative model which outputs an image representation basedthereon.

For example, as local descriptors extracted from the patches, SIFTdescriptors or other gradient-based feature descriptors, can be used.See, e.g., Lowe, “Distinctive image features from scale-invariantkeypoints,” IJCV vol. 60 (2004). In one illustrative example employingSIFT features, the features are extracted from 32.times.32 pixel patcheson regular grids (every 16 pixels) at five scales, using 128-dimensionalSIFT descriptors. Other suitable local descriptors which can beextracted include simple 96-dimensional color features in which a patchis subdivided into 4.times.4 sub-regions and in each sub-region the meanand standard deviation are computed for each of the channels (e.g.,three: R, G and B, in the case of color documents, or a single channelin the case of monochrome images). These are merely illustrativeexamples, and additional and/or other features can be used. The numberof features in each local descriptor is optionally reduced, e.g., to 64dimensions, using Principal Component Analysis (PCA). Signatures can becomputed for two or more regions of the image and aggregated, e.g.,concatenated.

In some illustrative examples, a Fisher vector is computed for the imageby modeling the extracted local descriptors of the image using a mixturemodel to generate a corresponding image vector having vector elementsthat are indicative of parameters of mixture model components of themixture model representing the extracted local descriptors of the image.The exemplary mixture model is a Gaussian mixture model (GMM) comprisinga set of Gaussian functions (Gaussians) to which weights are assigned inthe parameter training. Each Gaussian is represented by its mean vector,and covariance matrix. It can be assumed that the covariance matricesare diagonal. See, e.g., Perronnin, et al., “Fisher kernels on visualvocabularies for image categorization” in CVPR (2007). Methods forcomputing Fisher vectors are more fully described in U.S. Pub. No.20120045134, published Feb. 23, 2012 entitled LARGE SCALE IMAGECLASSIFICATION, by Florent Perronnin, et al., and U.S. Pub. No.20120076401, published Mar. 29, 2012, entitled IMAGE CLASSIFICATIONEMPLOYING IMAGE VECTORS COMPRESSED USING VECTOR QUANTIZATION, by JorgeSanchez, et al., and in Jorge Sanchez, and Thomas Mensink, “Improvingthe fisher kernel for large-scale image classification,” in Proc.11.sup.th European Conference on Computer Vision (ECCV): Part IV, pages143-156 (2010); Perronnin, F. and Liu, Y. and Sanchez, J. and Poirier,H., “Large-scale image retrieval with compressed fisher vectors,” inProc. of Computer Vision and Pattern Recognition (CVPR), pp. 3384-3391,2010; and Jorge Sanchez and Florent Perronnin, “High-dimensionalsignature compression for large-scale image classification,” in CVPR2011, the disclosures of which are incorporated herein by reference intheir entireties. The trained GMM is intended to describe the content ofany image within a range of interest (for example, different types ofdocument pages, including images, text, emails, and the like).

Other methods of generating image representations which can be used asfeatures herein are described in U.S. Pub. Nos. 20030021481; 2007005356;20070258648; 20080069456; 20080240572; 20080317358; 20090144033;20090208118; 20100040285; 20100082615; 20100092084; 20100098343;20100189354; 20100191743; 20100226564; 20100318477; 20110026831;20110040711; 20110052063; 20110072012; 20110091105; 20110137898;20110184950; 20120045134; 20120076401; 20120143853, and 20120158739, thedisclosures of which are incorporated herein by reference in theirentireties.

Annotation of Print Jobs (S104, S106)

At S104, a user annotatable note is presented to the user to capture thereasons for the print job.

In the exemplary embodiment, fewer than all of the print jobs in the setare annotated with notes. In some embodiments, a user may bepermitted/chose to apply a task note but no constraint note, or viceversa. In other embodiments, the user may be required to/choose to applyboth a task note and constraint note when annotating a print job.

The annotation can be made easy for the employees. For example,annotation may form a part of their recognized usual activity, or theymay be provided with an incentive to annotate print jobs (such as anincrease in their printing quota). In some embodiments, annotatingdocuments allows them to access a more informative visualization oftheir printing history, e.g., through a Personal Assessment Tool (PAT),as described in copending U.S. Pub. Nos. 20110273739 and 20120033250 andU.S. application Ser. No. 13/774,020, filed Feb. 22, 2013, thedisclosures of which are incorporated herein by reference in theirentireties.

Various procedures for annotation are contemplated which can be usedindividually or in combination. For example, the annotation process canbe initiated spontaneously by the users or when requested by the system,for example, to use active learning in order to validate or refine theactual clustering. Users may annotate one (or a set of) selected printjob(s), thereby associating it to a corresponding one of a set of tasksand identifying constraints on printing. In another embodiment, the usermay annotate a point in time or time frame with the task they weremainly performing at that time (e.g., reviewing papers for a conference,preparing for a customer visit, etc.) and the system identifies printjobs submitted during that time frame and associates them with thattask.

In one embodiment, users can provide annotations when submitting printjobs. In this case, the annotations may be integrated into the existingprinting selection process, e.g., within one of the already existingnotification pop-up windows informing the user that his print job hasbeen sent to or processed by the printer. In one study, it was shownthat at least a significant portion of users would have been motivatedto do so to pinpoint paper-based processes that should evolve to digitalform (e.g., legal documents or forms requiring a signature).

Users can also provide annotations of print jobs or time frames at alater time from a print history view. In one embodiment, a graphicaluser interface which provides a Personal Assessment Tool (PAT), asdescribed above, provides a print history view visualizing the user'sprint jobs over time. For example, the print history provides thedocument title and length. In addition, users may be provided withaccess to the visual document content, i.e. the document page images.From this information, users can associate a set of print jobs to thetask to which they belong. Alternatively, users can specify a time frameand associate it to one or a set of tasks or to a particular eventgenerating associated tasks. This indicates that the print jobs theyinitiated in this time frame correspond to the tasks they were primarilyexecuting in that time frame.

It has been found that it is relatively easy to motivate at least aminimum number of users to participate in the annotation effort.Experiments with systems like the Personal Assessment Tool indicate thatpeople will often be willing to provide annotation even if they will notgain any particular benefits for themselves. Also, the annotationsprovide the users with a method to make their work and barriers explicitto the management, which constitutes another incentive to annotate. Ithas been found that some people are annoyed about having to printdocuments and are therefore willing to provide spontaneous feedbackabout the nature of the print job and the reasons for submitting it.

In some embodiments, a reward may be provided to users who provideuser-annotations for at least some of their respective submitted printjobs. For example, the system may motivate users to participate in theannotation effort by giving them a better or more detailed feedbackabout their printing behavior in relation to their participation in theannotation of print jobs. In one embodiment, users may initially beprovided with only a basic breakdown of their print jobs, e.g.,according to the applications used to launch them (or no breakdown atall), whereas once they have annotated a minimum number of print jobs,they are provided with a more detailed breakdown of their jobs accordingto corresponding identified tasks, optionally together with annotatedconstraints for paper usage. With an increasing number of annotations,this information may be displayed in more and more detail. In addition,the system may provide participating users with access to complementaryinformation extracted from their print jobs and involved in theclustering, such as the most frequent words observed in the user's printjobs' document titles. The incentive for the user is to provide alimited amount of annotations to a community of users that will be usedby the system to improve and refine the overall clustering and gettingin return the benefit from all the annotations provided by the wholecommunity of users. To encourage the user to provide even moreannotations, the system may provide an incrementally improvingbreakdown. This means that to motivate the user, the system initiallydisplays only a limited breakdown of information to the user, even if itcould provide quite detailed information based on other users' priorannotations. The level of detail may thus increase with the number andvalue of the annotations provided.

In other embodiments, the users may be provided with one or moreadditional or alternative rewards, such as a cash payment, an increasein their print allocation, or other tangible or intangible rewards, asdiscussed, for example, in U.S. Pub. Nos. 20110273739 and 20120033250.

In the context of the PAT system, one suitable occasion to ask users toannotate some of their typical print jobs is during the self-assessmentstep: this is when users are reviewing their printing habits and willthus be naturally more inclined to provide annotations.

Print Job Clustering

According to an exemplary embodiment, the features extracted from theprint jobs, such as the visual features associated to each print job,enable them to be automatically grouped into clusters. Each cluster canbe considered as corresponding to a different task or note category.This helps to detect documents involved in the same process or task,since they are often associated with documents of similar structure. Forexample, it may be expected that documents associated with organizingtravel (plane e-tickets, hotel reservations, travel map, etc.) or withthe filing of intellectual property documents (invention disclosure,patent applications, copyright forms, publications) may occur morefrequently in some groups than others. Recent development in documentclustering techniques show that it is possible to deal with millions ofdocuments using compressed image signatures, with no loss of accuracycompared to a precise description of the images. See, for example,Perronnin, F. and Liu, Y. and Sanchez, J. and Poirier, H., “Large-scaleimage retrieval with compressed fisher vectors,” in Proc. ComputerVision and Pattern Recognition (CVPR), pp. 3384-3391, 2010.

Based on features that are extracted for each document and the subset ofannotated documents, the annotation component 22 of the system learnsclustering parameters for a set of clusters and propagates the labels toall the documents which have not yet been labeled. This may be performedusing a supervised learning technique based on existing labels or asemi-supervised learning method. Exemplary methods for clustering aredescribed, for example, in Seeger, M., “Learning with labelled andunlabelled data,” (Technical Report), University of Edinburgh (2001),and Zhu, Xiaojin, John Lafferty, and Ronald Rosenfeld, “Semi-supervisedlearning with graphs,” Diss. Carnegie Mellon University, LanguageTechnologies Institute, School of Computer Science, 2005.

Example clustering algorithms which may be used herein includeNonnegative Matrix Factorization (NMF), Probabilistic Latent SemanticAnalysis (PLSA), and Latent Dirichlet Allocation (LDA). See, forexample, Lee, “Algorithms for nonnegative matrix factorization,”Advances in Neural Information Processing Systems, 13:556-562, 2001;Hofmann, “Unsupervised learning by probabilistic latent semanticanalysis,” Machine Learning, 42(1/2):177-196, 2001; and Blei, et al.,“Latent dirichlet allocation,” J. Machine Learning Res., 3:993-1022,2003, for a discussion of these techniques.

As an example, in PLSA, a mixture model may be used in which theprobability of a print job representation w given a label u is expressedas a sum over a set of classes z of the probability of therepresentation given a class and the probability of the class, given alabel:

P _(LSA)(w|u)=Σ_(z) P(w|z;θ)P(z|u;π)

where θ and π (and optionally also the number N of clusters) areparameters to be learned, e.g., via log-likelihood maximization whichoptimizes the values of the parameters. This can be approximated byexpectation maximization. In the expectation step, the probability thatthe occurrence of representation w with label u can be explained bycluster z is computed given current values of the parameters.

${P\left( {\left. z \middle| u \right.,w} \right)} = \frac{{P\left( {\left. z \middle| u \right.;\pi} \right)}{P\left( {\left. w \middle| z \right.;\theta} \right)}}{\Sigma_{z^{\prime}}{P\left( {\left. z^{\prime} \middle| u \right.;\pi} \right)}{P\left( {\left. w \middle| z^{\prime} \right.;\theta} \right)}}$

In the maximization step, the parameters are re-estimated, based on theprobabilities computed in the expectation step.

P(w|z,θ)∝Σ_(u) n(u,w)P(z|d,w),

where n(u, w)P(z|d, w) represents how often representation w isassociated with class z, and

P(z|u,π)∝Σ_(w) n(u,w)P(z|u,w),

where n(u, w)P(z|u, w) represents how often label u is associated withclass z.

The two steps are iterated until convergence or until a stoppingcriterion is met.

The number of clusters may be predefined, e.g., in terms of an exactnumber of clusters or in terms of a maximum and/or minimum number ofclusters. In other embodiments, the clustering algorithm is permitted toselect an optimum number of clusters.

In the supervised case, the task labels as well as the print jobrepresentations are used in initial learning of the cluster parameters.In the semi-supervised case, the print jobs may be clustered basedsolely on the print job representations. The task labels are then usedto refine the clusters, e.g., by merging two clusters which have printjobs having the same task labels.

Note that in this specific application, the goal of the clustering is toobtain clusters corresponding to print job tasks, but not to categoriesspecific to the document content itself, since it is unlikely that thecontent will reoccur frequently. However, the structure of the documentis often characteristic of a task that is repeated over time. It hasbeen found that for some applications, a visual signature, based onimage features, may be a more useful feature than the other suggestedfeatures for grouping documents, such as the time of printing, the userID, or the OCR output containing the actual words of a document. Forexample, the distribution of the words of two patents may be verydifferent if they are on different topics, but the visual signatures maybe fairly similar, due to similar graphic elements, font types, shapesand sizes of text blocks, and shapes in the figures. It has been foundthat evaluating how many print jobs are actual patents, travel requestforms, and financial reports is informative about the paper-intensiveprocesses that occur in the company.

In one exemplary embodiment, the labeled print job data can be used toidentify parameters of clusters for the clustering model, which is thenused to assign unlabeled print jobs to clusters based on their extractedfeatures.

In another embodiment, the print job clustering system produces clustersof similar print jobs, initially roughly grouping, for example, printjobs related to similar basic types of documents, e.g. forms, letters,emails, presentations, etc. These initial clusters can then be refined,validated and associated to the corresponding tasks using the labels orother information input from the users who issued the jobs. Crowdsourcing information from the users, lets them annotate a small portionof their print jobs indicating to which task they correspond and alsowhy the document was required to be in paper form. The system then usesthe collected information to improve the clustering and this process caniterate until the results obtained are consistent. This approach has theadvantage of requiring only a limited number of annotations and thusonly a limited number of users annotating their jobs. The number ofannotations needed may depend on the number of different tasks withinthe organization, the variability of corresponding documents involved,and on the quality of the clustering mechanism.

Once the clustering parameters are learned, unlabeled print jobs can beautomatically assigned to clusters based on their print jobrepresentations alone.

Consumable Usage (S114)

In computing the amount of consumable used in a given task, the analysiscomponent 36 of the system may compute the sum of the sheets of paperused or pages printed in both the task-labeled and the unlabeled printjobs in a respective cluster. As will be appreciated, other methods maybe used for computing consumable usage which take into account a numberof factors may be used, as described, for example in U.S. Pub. No.20120033250. In computing the amount of consumable associated with agiven constraint, the analysis component 36 of the system may compute,for each of the set of constraints, the sum of sheets/printed pages inthe print jobs in a given cluster which have been labeled with thatconstraint. The analysis component may infer that the unlabeled printjobs in the cluster would have the same constraint distribution as thelabeled print jobs, in order to provide an amount of the consumable foreach constraint for the entire set of labeled and unlabeled print jobsin the cluster.

Identification/Annotation, Representation and Aggregation. (S104, S118)

The system is based on continuous tracking of the user's printconsumption, as developed within PAT. Through the aggregation of totalprint volume, and of volumes for print jobs with particular types,attributes or print settings, it analyses the user's print history andcan thus highlight typical printing patterns over time, e.g., highproportion of color versus black and white printing. The system thusdetects and puts forward typical or atypical peaks regarding a workperiod, e.g., typically high.

Consumption on the end of the month or week. Such patterns represent anarea where employee annotations can be particularly valuable. Indeed,they hint towards recurrent work processes where improvements are mostneeded and will have the biggest impact.

Drawing upon this pattern identification, the system confronts the userwith his patterns in order to engage him, through its central element,the “note”, to comment and explain these patterns, to contextualize, andto share these notes through a structured user dashboard with his peers.The system finally assembles all the notes in a group dashboardfeaturing various aggregation and management options. In the followingwe illustrate how the system provides these functionalities and how theuser interacts with the system.

Creating a Note

Recognizing the particular employee role and knowledge about printissues, the system proposes two different ways to create a note:on-the-fly and a posteriori. In the first case, the system stimulatesthe note creation for a particular print job; in the second case, itconfronts the user with a description and visualization of particularobserved (possibly problematic) printing patterns allowing him torespond to that indication by annotating the visualized context, forinstance explaining why the pattern occurs.

On-the-Fly Note

Immediately after issuing a print job, the system prompts the user foran optional note. See FIG. 4.

FIG. 4 the system prompts the user to take a note on the fly.

If the user chooses to attach a note to the print job, the “note editor”opens. The user provides a title and a short description. When savingthe note, the system automatically attaches the actual context, e.g.,the user description/title and job attributes (hour/date, color or not,corresponding print job, etc.). The user decides whether he wants tofurthermore disclose and attach the print job document content or not.See FIG. 5.

FIG. 5 taking a note on the fly.

Contextual Note

The system also prompts the user to create notes from thevisualization/description of his/her print patterns and historyaggregated by the system. From within the personal dashboard, the systemhighlights to the user, through a textual description, his typical orrecent and particularly costly consumption patterns regarding variousprinting aspects (color and black and white, duplex, long documents,etc.). The user can furthermore open an additional view with theassociated graphic timeline (FIGS. 6 and 7).

FIG. 6 Highlighting a particular consumption pattern with the associatedtimeline.

By pointing out this information to the user, the system stimulateshim/her to reflect on such typical patterns and to respond by creatingnotes to explain them and/or to suggest possible improvements. Hereagain, once created, the system associates the note with its particularcontext (highlighted issue and timeframe, user etc.), enabling latergrouping of related notes.

FIG. 7 the user dashboard.

Organization/Management of notes in the personal dashboard.

Personal and peer notes are gathered into the user dashboard. There, thenote and its contents (title/description) and the context of the notes(print characteristics+related timeframe) appear in two linkedareas—respectively bottom-left and bottom-right. The upper part of thedashboard shows the user's personal and his peers' print consumption.(See FIG. 8.)

FIG. 8. The structure of the user dashboard.

The navigation through the notes can be done either directly through thenote pan (bottom-left) or through the textual pattern description areas(bottom-right). The user has the choice to display only his personalnotes or also all existing peer notes. Notes are displayed from the mostto least supported (see below). The user can then navigate through thesenotes (bottom-left) and the system will show the corresponding context(textual description and timeframe) in the corresponding area (bottomright): textual description, related timeline and authorship. (See FIG.9.) The user can also navigate through the pattern description area(bottom-right) and the system then updates the note pan withcorresponding notes if any. For peer notes, the user has the possibilityto support/vote for them, assuming that he shares the concerned issue.Last but not least, the user can add more details to his existing notes.(See FIG. 10.)

FIG. 11 the user dashboard including the note pan (bottom left) and thetextual pattern descriptions (bottom right). The user can navigatethrough either of them.

Group Dashboard.

To gather and elaborate on all the notes taken by its users, the systemprovides a group dashboard with various features to organize notes. Thisgroup dashboard allows to review and group the notes created during alimited time period and to collectively elaborate solutions andimprovements to recurrent print-processes issues.

The group dashboard is composed by three areas. On upper left, the noteselector allow to scroll Notes are ordered from most to least supported,i.e., peer votes received. From there, the users, either as a group (orin a prior step for instance only the manager preparing the meeting),review the different notes. On the upper right area, the system showsvarious information related to the currently selected/visible note. Foran “on-the-fly” note, the attributes of the print job are shown and, ifchosen by the user, the related document is accessible. Concerning a

consumption pattern note, the system give access to the relatedcontext—textual description of the issue and related timeframe, useretc.—from where the note has been created.

Filtering Notes

To enable handling an important number of notes and/or to support theexploration and clustering of notes, the group dashboard includes afiltering pan (on its left) (FIG. 12).

From there, the user frames the notes review and limits it to certaintypes and/or groups of users. It also allows choosing how to order thenotes: from most to least supported, chronologically or according touser types. Once filters are selected, only related notes will bepresented in the note selector. As an extension, the system can alsointegrate state of the art clustering methods, based on languageprocessing, as, e.g., those described in U.S. patent application Ser.No. 13/783,650, filed Mar. 4, 2013, by Willamowski et al., and entitled“SYSTEM AND METHOD FOR HIGHLIGHTING BARRIERS TO REDUCING PAPER USAGE”.

Organizing Notes.

When reviewing the different notes, users can drag notes into theworking area. The objective is to select and group notes addressingsimilar issues. In the working area, after having selected relatednotes, users group them into a new aggregated and more general note. Asa consequence, the initial regrouped notes disappear from the list,replaced by this new aggregated note.

Through these clustering and note management features, the groupdashboard aims at supporting and animating a group discussion aboutprinting issues and possible improvements. Such a meeting can beanimated through a systematic review of the notes, ranked by support, inchronological order or by type of employees. For each note, the groupthen discusses the related issue, also documented within the notedetails pan, and collectively discusses and elaboratessolution/improvements regarding the corresponding typical workprocesses.

The system and method described herein thus provide an approach tohighlight barriers to moving from paper to digital format. This approachcombines print job tracking, feature extraction and clustering with userannotation of the print jobs, where users annotate print jobs withinformation about the task to which they belong and why they have beenprinted on paper, i.e. why the corresponding task is paper-based and notdigital. A benefit of this approach is that it is then easy to detectdocument types and reasons for printing which create substantial papervolume.

While the exemplary embodiment is directed to a print job workflow inwhich the print jobs are generated at work stations, scanning and/or faxprint job workflows can be similarly annotated. Annotation can be doneat the scan/fax devices or afterwards, through the exemplary graphicaluser interface 70. Combining printing and scanning/faxing workflows mayprovide more information about paper workflow.

Further, the system and method can be used to evaluate and measure theprocess of the transition from paper to digital. For example, the changein the clusters over time and the constraints of the annotated printjobs in the clusters can be monitored. This allows the implementation ofa paper-reduction procedure to be evaluated. The evaluation may includefor first and second sets of the print jobs submitted in respectivedifferent time periods, acquiring print job information, computing printjob representations, providing for and receiving user-annotations,clustering the print jobs, and generating of a representation of the setof print jobs, and comparing the representation for the second set ofprint jobs with the representation for the first set of print jobs. Forexample, the method may be used to determine whether the paper-usagecorresponding to a specific task decreases once eReader solutions havebeen put in place.

The exemplary system and method profit from the willingness of print jobsubmitters to annotate the documents, which is complemented with printjob clustering and additional crowd sourced annotation as needed tocomplete the understanding of the tasks, the corresponding print jobsand the motivations for printing.

Some portions of the detailed description herein are presented in termsof algorithms and symbolic representations of operations on data bitsperformed by conventional computer components, including a centralprocessing unit (CPU), memory storage devices for the CPU, and connecteddisplay devices. These algorithmic descriptions and representations arethe means used by those skilled in the data processing arts to mosteffectively convey the substance of their work to others skilled in theart. An algorithm is generally perceived as a self-consistent sequenceof steps leading to a desired result. The steps are those requiringphysical manipulations of physical quantities. Usually, though notnecessarily, these quantities take the form of electrical or magneticsignals capable of being stored, transferred, combined, compared, andotherwise manipulated. It has proven convenient at times, principallyfor reasons of common usage, to refer to these signals as bits, values,elements, symbols, characters, terms, numbers, or the like.

It should be understood, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise, as apparent from the discussion herein,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The exemplary embodiment also relates to an apparatus for performing theoperations discussed herein. This apparatus may be specially constructedfor the required purposes, or it may comprise a general-purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but is not limited to, any type ofdisk including floppy disks, optical disks, CD-ROMs, andmagnetic-optical disks, read-only memories (ROMs), random accessmemories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, or any typeof media suitable for storing electronic instructions, and each coupledto a computer system bus.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general-purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the methods described herein. The structure for avariety of these systems is apparent from the description above. Inaddition, the exemplary embodiment is not described with reference toany particular programming language. It will be appreciated that avariety of programming languages may be used to implement the teachingsof the exemplary embodiment as described herein.

A machine-readable medium includes any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer). For instance, a machine-readable medium includes read onlymemory (“ROM”); random access memory (“RAM”); magnetic disk storagemedia; optical storage media; flash memory devices; and electrical,optical, acoustical or other form of propagated signals (e.g., carrierwaves, infrared signals, digital signals, etc.), just to mention a fewexamples.

The methods illustrated throughout the specification, may be implementedin a computer program product that may be executed on a computer. Thecomputer program product may comprise a non-transitory computer-readablerecording medium on which a control program is recorded, such as a disk,hard drive, or the like. Common forms of non-transitorycomputer-readable media include, for example, floppy disks, flexibledisks, hard disks, magnetic tape, or any other magnetic storage medium,CD-ROM, DVD, or any other optical medium, a RAM, a PROM, an EPROM, aFLASH-EPROM, or other memory chip or cartridge, or any other tangiblemedium from which a computer can read and use.

Alternatively, the method may be implemented in transitory media, suchas a transmittable carrier wave in which the control program is embodiedas a data signal using transmission media, such as acoustic or lightwaves, such as those generated during radio wave and infrared datacommunications, and the like.

It will be appreciated that variants of the above-disclosed and otherfeatures and functions, or alternatives thereof, may be combined intomany other different systems or applications. Various presentlyunforeseen or unanticipated alternatives, modifications, variations orimprovements therein may be subsequently made by those skilled in theart which are also intended to be encompassed by the following claims.

What is claimed is:
 1. A method for identifying constraints on reducingconsumable usage comprising: acquiring print job information for a setof print jobs submitted for printing by a set of users, each print jobincluding a document to be printed; generating a print job contextualrepresentation for each of the print jobs; providing for user-annotationof the submitted print jobs with a user-annotated note expressing areason for printing the print job; receiving user-annotations for atleast some of the submitted print jobs; generating a user dashboard foreach of the set of users, the user dashboard displaying the user's printconsumption history, the set of users print consumption history and oneor more received user-annotated notes for the set of users; andgenerating a group dashboard, the group dashboard groupinguser-annotations for the set of users by one or more of attributesassociated with the print jobs; wherein the generating a print jobcontextual representation, providing for user-annotations of a note,receiving user-annotated notes, and generating a user dashboard isperformed with a computer processor.
 2. The method of claim 1, whereinthe computing of the print job contextual representation for each of theprint jobs is based on features extracted from the print job informationincluding extracting features selected from a group including: user ID;print job submission time; document title; document length; print jobtype; textual content features; visual content features; and pagecoverage features.
 3. The method of claim 2, wherein the featuresinclude visual features, the computing of the print job contextualrepresentation including extracting low-level features from patches of apage of the document of the print job and generating a statisticalrepresentation of the page based on the extracted low-level features. 4.The method of claim 2, wherein the features include textual features,the computing of the print job contextual representation includes wordsfrom the print job and generating a statistical representation of theprint job based on the extracted words.
 5. The method of claim 1,wherein the generating of the representation of the set of print jobscomprises generating a graphical representation.
 6. A method foridentifying constraints on reducing consumable usage comprising:acquiring print job information for a set of print jobs submitted forprinting by a set of users, each print job including a document to beprinted; generating a print job contextual representation for each ofthe print jobs; providing for user-annotation of the submitted printjobs with a user-annotated note expressing a reason for printing theprint job; receiving user-annotations for at least some of the submittedprint jobs; generating a user dashboard for each of the set of users,the user dashboard displaying the user's print consumption history, theset of users print consumption history and one or more receiveduser-annotated notes for the set of users; and evaluating a transitionfrom paper to digital documents, comprising: for first and second setsof the print jobs, performing the acquiring print job information,generating the print job contextual representation, providing foruser-annotation, receiving user-annotations, and comparing therepresentation for the second set of print jobs with the representationfor the first set of print jobs; wherein generating a print jobcontextual representation, providing for user-annotations of a note,receiving user-annotated notes, generating a user dashboard, andevaluating a transition from paper to digital documents is performedwith a computer processor.
 7. The method of claim 6, wherein theinformation for the second set of print jobs is acquired for print jobssubmitted after submission of the first set of print jobs.
 8. The methodof claim 1, wherein the acquiring print job information for a set ofprint jobs submitted for printing comprises acquiring print logs from aplurality of printers for the submitted print jobs.
 9. The method ofclaim 1, wherein the providing for user-annotation of the submittedprint jobs comprises providing a reward to users who provideuser-annotations for at least some of their respective submitted printjobs.
 10. A method for identifying constraints on reducing consumableusage comprising: acquiring print job information for a set of printlobs submitted for printing by a set of users, each print job includinga document to be printed; generating a print job contextualrepresentation for each of the print jobs; providing for user-annotationof the submitted print jobs with a user-annotated note expressing areason for printing the print job; receiving user-annotations for atleast some of the submitted print jobs; generating a user dashboard foreach of the set of users, the user dashboard displaying the user's printconsumption history, the set of users print consumption history and oneor more received user-annotated notes for the set of users; and whereingenerating a print job contextual representation, providing foruser-annotations of a note, receiving user-annotated notes, andgenerating a user dashboard is performed with a computer processor, andwherein receiving user-annotations for at least some of the submittedprint jobs includes receiving annotations for fewer than all of theprint jobs in the set, and assigning print jobs without an annotation toa respective cluster based on the respective print job contextualrepresentation.
 11. (canceled)
 12. The method according to claim 1,wherein user-annotation of the submitted print job is promptedimmediately after a user issues the print job.
 13. The method accordingto claim 1, further comprising: providing for user-annotation of a noteassociated with a detected pattern of printing by a user.
 14. A computerprogram product comprising a non-transitory recording medium storinginstructions which, when executed by a computer processor, perform themethod of claim
 1. 15. A system comprising memory which storesinstructions for performing the method of claim 1 and a processor incommunication with the memory which implements the instructions.
 16. Asystem for identifying constraints on reducing consumable usagecomprising: a job tracking component for acquiring print job informationfor a set of print jobs submitted for printing by a set of users, eachprint job including a document to be printed; a print job contextualrepresentation generation component for generating a print jobcontextual representation for each of the print jobs; an annotationcomponent for receiving user-annotations for at least some of thesubmitted print jobs, the user user-annotations including auser-annotated note expressing a reason for printing the print job; ananalysis component for generating a representation of the set of printjobs which represents reasons for printing of print jobs based on theusers' annotations; a user dashboard component for generating a userdashboard for each of the set of users, the user dashboard displayingthe user's print consumption history, the set of user's printconsumption history and one or more user annotated notes for the set ofusers; a group dashboard component for generating a group dashboardgrouping user-annotations for the set of users by one or more attributesassociated with the print jobs; and a processor which implements the jobtracking component, the print job contextual representation generationcomponent, the annotation component, the analysis component, the userdashboard component, and the group dashboard component.
 17. The systemaccording to claim 16, wherein the computing of the print job contextualrepresentation for each of the print jobs is based on features extractedfrom the print job information including extracting features selectedfrom a group including: user ID; print job submission time; documenttitle; document length; print job type; textual content features; visualcontent features; and page coverage features.
 18. (canceled)
 19. Thesystem according to claim 16, wherein user-annotation of the submittedprint job is prompted immediately after a user issues the print job. 20.The system according to claim 16, wherein user-annotation of a note isassociated with a detected pattern of printing by a user.
 21. A methodfor identifying constraints on reducing consumable usage comprising:acquiring print job information for a set of print jobs submitted forprinting by a set of users, each print job comprising a document to beprinted; computing a print job representation for each of the print jobsbased on features extracted from the print job information, the featuresincluding a statistical representation of low-level features extractedfrom patches of a page of the document; receiving user-annotations forat least some of the submitted print jobs whereby submitted print jobsare annotated with a user annotated note expressing a reason forprinting the print job; partitioning the print jobs into clusters basedon the print job representations and annotations; and generating arepresentation of the set of print jobs which represents reasons forprinting of print jobs in at least one of the clusters, based on theusers' c annotations, wherein the computing of the print jobrepresentation, receiving user-annotations, partitioning the print jobs,and generating of the representation of the set of print jobs areperformed with a computer processor.
 22. A system for identifyingconstraints on reducing consumable usage and evaluating a transitionfrom paper to digital documents comprising: a job tracking component foracquiring print job information for a set of print jobs submitted forprinting by a set of users, each print job including a document to beprinted; a print job contextual representation generation component forgenerating a print job contextual representation for each of the printjobs; an annotation component for receiving user-annotations for atleast some of the submitted print jobs, the user user-annotationsincluding a user-annotated note expressing a reason for printing theprint job; an analysis component for generating a representation of theset of print jobs which represents reasons for printing of print jobsbased on the users' annotations; a user dashboard component forgenerating a user dashboard for each of the set of users, the userdashboard displaying the user's print consumption history, the set ofuser's print consumption history and one or more user annotated notesfor the set of users; and a processor which implements the job trackingcomponent, the print job contextual representation generation component,the annotation component, the analysis component, and the user dashboardcomponent; wherein for a first and a second set of the print jobs, theprocessor implements the job tracking component, the print jobcontextual representation generation component, the annotationcomponent, the analysis component, and the user dashboard component, andcompares a representation of the second set of print jobs with arepresentation of the first set of print jobs.
 23. A system foridentifying constraints on reducing consumable usage comprising: a jobtracking component for acquiring print job information for a set ofprint jobs submitted for printing by a set of users, each print jobincluding a document to be printed; a print job contextualrepresentation generation component for generating a print jobcontextual representation for each of the print jobs; an annotationcomponent for receiving user-annotations for at least some of thesubmitted print jobs, the user user-annotations including auser-annotated note expressing a reason for printing the print job,wherein receiving user-annotations for at least some of the submittedprint jobs includes receiving annotations for fewer than all of theprint jobs in the set; a clustering component for clustering the printjobs into clusters based on the print job contextual representations,wherein the annotation component receives user-annotations for fewerthan all of the submitted print jobs, and the clustering componentassigns the submitted print jobs without a user-annotation to arespective cluster based on the respective print job contextualrepresentation; an analysis component for generating a representation ofthe set of print jobs which represents reasons for printing of printjobs based on the users' annotations; a user dashboard component forgenerating a user dashboard for each of the set of users, the userdashboard displaying the user's print consumption history, the set ofuser's print consumption history and one or more user annotated notesfor the set of users; and a processor which implements the job trackingcomponent, the print job contextual representation annotation component,the clustering component, the analysis component, and the user dashboardcomponent.